So I have really been enjoying the beautifulSoup tutorial series so far! So I decided to put what I learnt to practice and made a website that reads the TV Program of the top channels in my country ( Greece ) and check to see if a movie is playing that day. If so it parses the Title, Time, and the channel that it is on and stores it on a .txt file. Then since I couldn't bother to write code in JavaScript I used my favorite language ( C++ ) to write a program that writes the .html file ( I know this is pretty redundant but it was something I knew how to do so I did that ). Then I ran the C++ program and uploaded the index.html with the .css to github and made it a github page. Finally in order for the program to keep getting updates of every day. I wrote a simple shell script that compiles runs the python file to parse the data, compiles and runs the .cpp file to write the index.html file and finally pushes the changes on github where the page gets updated ( also for convenience I linked a .tk domain so that it would be easier to type).
the site is: http://tipaizei.tk/ which simply translates to "What's on".tk
Here is the code:
Note: There was an issue when I first tried to post this due to the greek letters so I replaced them all with the work GreekLet.
for channel in channels: #print(channel.string) channelList.append(channel.string)
chnl = ''
for tr in soup.find_all('tr'): if tr.find('span', {'class':'program__channel-name'})!=None: #print(tr.find('span', {'class':'program__channel-name'}).string) chnl = tr.find('span', {'class':'program__channel-name'}).string
if(chnl=='OTE Cinema 1 HD'): break for movie in tr.find_all('td', {'class':'movie'}): name = movie.find('span',{'class':'program__show'}).find('a').string time = movie.find('span',{'class':'program__hour'}).string #print(chnl) out.write(chnl+'n') #print(time) out.write(time+'n') #print(name+'n') out.write(name+'n')
#print("SCRIPT_END") out.write("SCRIPT_END") #for movie in soup.find_all('td', {'class':'movie'}): # name = movie.find('span',{'class':'program__show'}).find('a').string # time = movie.find('span',{'class':'program__hour'}).string # if movie.find('span',{'class':'program__show'}).find('a').string == None: # print("efwfqwfehejfgadsgfkasgfoyuwfegdo22g3iUY@#$YUU!$U!$IF$!$IU$") # print("n"+time) # print(name)
######### WITHOUT WRITING TO FILE ########
'''
for channel in channels: #print(channel.string) channelList.append(channel.string)
chnl = ''
for tr in soup.find_all('tr'): if tr.find('span', {'class':'program__channel-name'})!=None: #print(tr.find('span', {'class':'program__channel-name'}).string) chnl = tr.find('span', {'class':'program__channel-name'}).string
if(chnl=='OTE Cinema 1 HD'): break for movie in tr.find_all('td', {'class':'movie'}): name = movie.find('span',{'class':'program__show'}).find('a').string time = movie.find('span',{'class':'program__hour'}).string print("n"+chnl) print(time) print(name)
'''
You can uncomment the print statements so that you also get to see the info on the terminal ( It is at greek though ).
Here is the .cpp file that writes the index.html file:
Also here is the shell script that I wrote. It isn't anything fancy but I thought that I might as well include it.
#!/bin/bash echo "Parsing Data from the Internet" python3 parse.py echo "Updating index.html" g++ update.cpp ./a.out echo "Pushing update togithub" git add index.html git status git commit -m "Daily Update" git push echo "Succesfully updated index.html"
I hope that you find my code useful and that you are inspired to work on your own personal projects. Special Thanks to Harrison for all the great videos on Beautiful soup.
You must be logged in to post. Please login or register an account.
Awesome, thanks for sharing your work with us!
-Harrison 8 years ago
You must be logged in to post. Please login or register an account.
Sure no problem!
-Panagiotis Petridis 8 years ago
You must be logged in to post. Please login or register an account.
Also here's the page on github: https://github.com/PanagiotisPtr/tipaizei.github.io
-Panagiotis Petridis 8 years ago
You must be logged in to post. Please login or register an account.